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Abstract 



The memory model of a shared-memory multiprocessor is a contract between 
the designer and programmer of the multiprocx^ssor. The sequential consistency 
memory model specifies a total order among the memory (read and write) events 
performed at each processor. A trace of a memory system satisfies sequential 
consistency if there exists a total order of all memory events in the trace that 
is both consistent with the total order at each processor and has the property 
that every read event to a location returns the value of the last write to that 
location. 

Descriptions of shared-memory systems are typically parameterized by the 
number of processors, the number of memory locations, and the number of data 
values. It has been shown that even for finite parameter values, verifying sequen- 
tial consistency on general shared-memory systems is undecidable. We observe 
that, in practice, shared-memory systems satisfy the properties of causality and 
data independence. Causality is the property that values of read events flow 
from values of write events. Data independence is the property that all traces 
can be generated by renaming data values from traces where the written values 
are distinct from each other. If a causal and data independent system also has 
the property that the logical order of write events to each location is identical to 
their temporal order, then sequential consistency can be verified algorithmically. 
Specifically, we present a model checking algorithm to verify sequential consis- 
tency on such systems for a finite number of processors and memory locations 
and an arbitrary number of data values. 



1 Introduction 



Shared-memory multiprocessors are very complex computer systems. Multi- 
threaded programs running on shared-memory multiprocessors use an abstract 
view of the shared memory that is specified by a memory model. Examples 
of memory models for multiprocessors include sequential consistency [Lam79] , 
partial store ordering [WG99], and the Alpha memory model [Com98]. The 
implementation of the memory model, achieved by a protocol running either 
in hardware or software, is one of the most complex aspects of multiprocessor 
design. These protocols are commonly referred to as cache-coherence protocols. 
Since parallel programs running on such systems rely on the memory model for 
their correctness, it is important to implement the protocols correctly. However, 
since efiiciency is important for the commercial viability of these systems, the 
protocols are heavily optimized, making them prone to design errors. Formal 
verification of cache-colicrenc;e protocols can detect these errors effectively. 

Descriptions of cache-coherence protocols are typically parameterized by the 
number of processors, the number of memory locations, and the number of data 
values. Verifying parameterized systems for arbitrary values of these parame- 
ters is undecidable for nontrivial systems. Interactive theorem proving is one 
approach to parameterized verification. This approach is not automated and is 
typically expensive in terms of the required human effort . Another approach is 
to model check a parameterized system for small values of the parameters. This 
is a good debugging technique that can find a number of errors prior to the more 
time-consuming effort of verification for arbitrary parameter values. In this pa- 
per, we present an automatic method based on model checking to verify that a 
cache-coherence protocol with fixed parameter values is correct with respect to 
the sequential consistency memory model. 

The sequential consistency memory model [Lam79] specifies a total order 
among the memory events (reads and writes) performed locally at each proces- 
sor. This total order at a processor is the order in which memory events occur 
at that processor. A trace of a memory system satisfies sequential consistency 
if there exists a total order of all memory events that is both consistent with 
the local total order at each processor, and has the property that every read to 
a location returns the latest (according to the total order) value written to that 
location. Surprisingly, verifying sequential consistency, even for fixed parame- 
ter values, is undecidable [AMP96]. Intuitively, this is because the witness total 
order could be quite different from the global temporal order of events for some 
systems. An event might need to be logically ordered after an event that occurs 
much later in a run. Hence any algorithm needs to keep track of a potentially 
unbounded history of a run. 

In this paper, we consider the problem of verifying that a shared-memory 
system S{n, m, v) with n processors, m locations and v data values is sequen- 
tially consistent. We present a method that can check sequential consistency for 
any fixed n and m and for arbitrary v. The correctness of our method depends 
on two assumptions — causaHty and data independence. The property of causal- 
ity arises from the observation that protocols do not conjure up data values; 
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data is injected into the system by the initial values stored in the memory and 
by the writes performed by the processors. Therefore every read operation r to 
location I is associated with either the initial value of I or some write operation 
w to I that wrote the value read by r. The property of data independence arises 
from the observation that protocols do not examine data values; they just for- 
ward the data from one component of the system (cache or memory) to another. 
Since protocol behavior is not affected by the data values, we can restrict our 
attention, without loss of generality, to unambiguous runs in which the writ- 
ten data values to a location are distinct from each other and from the initial 
value. We have observed that these two assumptions are true of shared- memory 
systems that occur in practice [LLG+90, KOH+94, BDH+99, BGM+00]. 

For a causal and unambiguous run, we can deduce the association between 
a read and the associated write just by looking at their data values. This leads 
to a vast simplification in the task of specifying the witness total order for 
sequential consistency. It suffices to specify for each location, a total order on 
the writes to that location. By virtue of the association of write events and 
read events, the total order on the write events can be extended to a partial 
order on all memory events (both reads and writes) to that location. If a 
read event r reads the value written by the write event w, the partial order 
puts r after w and all write events preceding w, and before all write events 
succeeding w. As described before, sequential consistency specifies a total order 
on the memory events for each processor. Thus, there are n total orders, one 
for each processor, and m partial orders, one for each location, imposed on the 
graph of memory events of a run. A necessary and sufficient condition for the 
run to be sequentially consistent is that this graph is acyclic. We further show 
that existence of a cycle in this graph implies the existence of a nice cycle in 
which no two processor edges (imposed by the memory model) are for the same 
processor and no two location edges (imposed by the write order) are for the 
same location. This implies that a nice cycle can have at most 2 x min{{n, m}) 
edges; we call a nice cycle with 2 x A; edges a fc-nice cycle. Further if the memory 
system is symmetric with respect to processor and location ids, then processor 
and location edges occur in a certain canonical order in the nice cycle. These 
two observations drastically reduce the number of cycles for any search. 

We finally argue that a number of causal and data independent shared- 
memory systems occurring in practice also have the property that the witness 
write order at each location is simply the temporal order of the write events. In 
other words, a write event w is ordered before w' if w occurs before w' . We call 
this a simple write order, and it is in fact the correct witness for a number of 
shared-memory systems. For cache-based shared-memory systems, the intuitive 
explanation is that at any time there is at most one cache with write privilege 
to a location. The write privilege moves from one cache to another with time. 
Hence, the logical timestamps [Lam78] of the writes to a location order them 
exactly according to their global temporal order. We show that the proof that 
a simple write order is a correct witness for a memory system can be performed 
by model checking [CE81, QS81]. Specifically, the proof for the memory system 
S{n, m, v) for fixed n and m and arbitrary v is broken into min{{n, m}) model 
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checking lemmas, where the fc-th lemma checks for the existence of canonical 
fc-nice cycles. 

The rest of the paper is organized as follows. Sections 2 and 3 formalize 

shared-memory systems and our assumptions of causality and data indepen- 
dence about them. Section 4 defines the sequential consistency memory model. 
Section 5 defines the notions of a witness and a constraint graph for an unam- 
biguous and causal run. Section 6 and 7 show that it is sufficient to search for 
canonical nice cycles in the constraint graph. Section 8 shows how to use model 
checking to detect canonical nice cycles in the constraint graphs of the runs of 
a memory system. Finally, we discuss related work in Section 9 and conclude 
in Section 10. 

2 Shared-memory systems 

Let N denote the set of positive integers. For any n > 1, let N„ denote the set 
of positive integers up to n. 

A memory system is parameterized by the number of processors, the number 
of memory locations, and the number of data values. In a memory system with 
n processors, m memory locations, and v data values, read and write events 
denoted by R and W can occur at any processor in N„, to any location in Nm, 
and have any data value in N„. Formally, we define the following sets of events 
parameterized by the number of processors n, the number of locations m, and 
the number of data values v, where n,m,v > 1. 

1. E'^{n, m, v) = {R} x N„ x x is the set of read events. 

2. E'^{n,m,v) = {W} x N„ x x N„ is the set of write events. 

3. E{n, m, v) = E^{n, m, v) U E^(n, m, v) is the set of memory events. 

4. E"'{n,m,v) 3 E{n,m,v) is the set of all events. 

5. E'^(n,m,v) \ E(n,m,v) is the set of internal events. 

For every memory event e = {a,b,c,d) S E{n,m,v), we define op{e) = a, 
proc{e) = b, loc{e) = c, and data{e) = d. The set of all finite sequences of 
events in E°'(n,m,v) is denoted by E°'{n,m,v)* . A memory system S{n,m,v) 
is a regular subset of E°'{n, m,v)*. A sequence a € S{n, m, v) is a run. 

Consider any a G E°'{n, m, v)* . We denote the length of a by |fT| and the i-th 
element of a by cr(z). The set of indices of the memory events in a is denoted 
by dom{a) = {1 < fc < |cr| | a{k) G E{n,m,v)}. The subsequence obtained by 
projecting a onto dom{a) is denoted by a. If a G S{n,m,v), the sequence a is 
a trace of S{n,m,v). A trace of S{n,m,v) for any n, m, > 1 is a trace of S. 
We define the following useful subsets of dom((7). 

1. For all I < i < n, the set of memory events by processor i denoted by 
P{a',i) = {fc € dom{a) \ proc{a{k)) = i}. 
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typedef Msg {m : {ACKS, ACKX], a : N„, d : U {m : {INVAL}, a : N„}; 
typedef CacheEntry {d : N^,s : {INV,SHD,EXC}}; 
cache : array N„ of array Nm of CacheEntry; 
inQ : array N„ of Queue(Msg); 
owner : array Nm of N„ U {0}; 

Initial predicate 

Vi e Nn,j G : (cac/ie[«][j].s = SHD A inQ[i].isEmpty A owner[j'] ^ 0) 
Events 

(i?, k) cache[i][j].s ^ INV A cache[i][j].d = fc — > 
"no op" 

{W,i,j,k) cache[i\[j].s = EXC ^ 

cache[i][j].d := k 
{ACKX,i,j) cache\i][j].s ^ EXC A owner\j] 9^ 0 ^ 

if owner[j\ ^ i then cache[owner\j\\\j].s := /ATF; 
owner [7] := 0; 
for each (p G N„) 
if {p = i) then 

inQ[p] := append{inQ[p], {ACKX , cache[owner[j]][j].d)) 
else if (p ^ owner[j] A cache\p][j].s ^ INV) then 
inQ\p] := append{inQ\p], {INVAL, j)) 
{ACKS, i, j) cache [i] [j].s = INV A owner [j] 0 ^ 
cacfte[o«;ner[j]][j].s := SHD; 
owner[j] := 0; 

m(5[i] := append{inQ[i],{ACKS,j,cache[owner[j\\[j].d)); 
{ UPD , i) -^isEmpty{inQ [i\ ) 

let msg = head{inQ[i\) in 
if {msg.m = INVAL) then 

cache[i][msg .a].s := INV 
else if {msg.m = ACKS) then { 
cache[i][msg.a] := {SHD, msg. d); 
owner[msg .a] := i 
} else { 

cache[i][msg .a] := {EXC ,msg.d); 
owner[msg .a] := i 

} 

inQ[i\ := <aii(mQ[i]) 

Figure 1; Example of memory system 
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2. For all 1 < i < m, the set of memory events to location i denoted by 

L{a,i) = {fc G dom{a) \ loc{a{k)) = z}. 

3. For all 1 < i < m, the set of write events to location i denoted by 
i'"(cr,i) = {fc e L(cr,i) I op{a{k)) = W}. 

4. For all 1 < z < m, the set of read events to location i denoted by U'ip, i) = 
{k e L{a,i) I op{a{k)) = R}. 

Example. Consider the memory system in Figure 1. It is a highly sim- 
plified model of the protocol used to maintain cache coherence within a single 
node in the Piranha chip multiprocessor system [BGM+00]. The system has 
three variables — cache, inQ and owner — and five events — ^the memory events 
{R, W} and the internal events {ACKX, ACKS, UPD}. The variables inQ and 
owner need some explanation. For each processor i, there is an input queue 
inQ[i] where incoming messages are put. The type of inQ[i] is Queue. The 
operations isEmpty, head and tail are defined on Queue, and the operation 
append is defined on Queue x Msg. They have the obvious meanings and their 
definitions have been omitted in the figure. For each memory location j, either 
owner[j] = 0 or owner[j] contains the index of a processor. Each event is associ- 
ated with a guarded command. The memory events R and W are parameterized 
by three parameters — processor i, location j and data value k. The internal 
events ACKX and ACKS arc parameterized by two parameters — processor i 
and location j. The internal event UPD is parameterized by processor i. A 
state is a valuation to the variables. An initial state is a state that satisfies 
the initial predicate. An event is enabled in a state if the guard of its guarded 
command is true in the state. The variables are initialized to an initial state 
and updated by nondeterministically choosing an enabled event and executing 
the guarded command corresponding to it. A run of the system is any finite 
sequence of events that can be executed starting from some initial state. 

A processor i can perform a read to location j if cache[i][j].s G {SHD, EXC}, 
otherwise it requests owner [j] for shared access to location j. The processor 
owner [j] is the last one to have received shared or exclusive access to location 
j. The request by i has been abstracted away but the response of owner[j] is 
modeled by the action ^Cif5'[i][j], which sends a ACKS message containing the 
data in location j to i and temporarily sets owner[j] to 0. Similarly, processor 
i can perform a write to location j if cache[i][j].s = EXC, otherwise it requests 
owner [j] for exclusive access to location j. The processor owner [j] responds 
by sending a ACKX message to i and INVAL messages to all other processors 
that have a valid copy of location j. owner[j] is set to i when processor i reads 
the ACKS or ACKX message from inQ[i] in the event UPD[i]. Note that new 
requests for j are blocked while owner [j] = 0. A processor i that receives an 
INVAL message for location j sets cache[i][j].s to INV. m 
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3 Causality and data independence 



In this section, we formalize our assumptions on memory systems. Each as- 
sumption is motivated by an observation about memory systems occurring in 

practice. 

Memory systems do not conjure up data values; they move around data 
values that were introdTiccd by initial values or write events. For example, in 
the memory system in Figure 1, only the write event W introduces a fresh data 
value in the system by updating the cache; the internal events ACKS, ACKX 
and UPD move data around and the read event R reads the data present in the 
cache. Therefore, the data value of a read operation must either be the initial 
value or the value introduced by a write event. We can now formally state our 
first assumption. 

Assumption 1 (Causality) There is a function init mapping each trace of S 
to a function in N ^ N such that for all n,m,v > I, traces r of S{n, m, v), and 
locations 1 < i <m, if x G L^{T,i), then either data{T{x)) = init{T){i) or there 
is y G L^{T,i) such that data{T{x)) = data{T{y)). 

A func;tion r satisfying Assumption 1 is called an initial function of the pa- 
rameterized memory system S. The initial function is used to model the initial 
values of the locations in the memory system. In the remainder of this paper, 
we fix a particular initial function init for S. 

Memory systems also have the property that control decisions are oblivious 
to the data values. A cache line carries along with the actual program data a 
few state bits for recording whether it is in shared, exclusive or invalid mode. 
Typically, actions do not depend on the value of the data in the cache line. 
For example, in the memory system shown in Figure 1, there are no predicates 
involving the data fields of the cache lines and the messages in any of the internal 
events of the system. In such systems, renaming the data values of a run results 
in yet another run of the system. Moreover, every run can be obtained by data 
value renaming from a run in which the initial value and values of write events 
to any location i are all distinct from each other. 

An unambiguous trace is one in which every write event to a location i 
has a value distinct from the initial value of i and the value of every other 
write to i. Thus, a read event can be paired with its source write event just by 
comparing data values. Formally, a trace r of 5(n, m, u) is unambiguous if for all 
X G L^{T,i), we have data(T(x)) ^ init{T){i) and data(T(x)) ^ data(T(y)) for 
all y € L^{t, i) \ {x}. The run a is unambiguous if the trace a is unambiguous. 

For all m,v,v' > 1, a function A : Nm x N„/ — > N„ is called a renaming 
function. Intuitively, the function A provides for each memory location c and 
data value d the renamed data value A(c, d). Let A'' be a function on E{n, m, v) 
such that for all e = {a, b, c, d) G E{n, m), we have A'^(e) = {a, b, c, A(c, d)). The 
function A'' is extended to sequences in E{n, m, v)* in the natural way. We can 
now state formally state our second assumption. 
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Assumption 2 (Data independence) For all n,m,v > 1, we have that t is 
a trace of S{n,m,v) iff there is v' > 1, an unambiguous trace t' of S(n,m,v') 
and a renaming function A : x N„' — > N„ such that r = X'^{t') and 
init{T){j) = X{j, init{T'){j)) for all 1 < j < m. 

Assumption 2 is motivated by the handling of data in typical cache-coherence 
protocols. This assumption can be syntactically enforced on protocol descrip- 
tions by imposing restrictions on the operations allowed on variables that contain 
data values [Nal99]. For example, one restriction can be that no data variable 
appears in the guard expression of an internal event or in the control expression 
of a conditional. 

4 Sequential consistency 

Suppose S{n,m,v) is a memory system for some n,m,v > 1. The sequential 
consistency memory model [Lam79] is a correctness requirement on the traces 
of S{n, m, v). In this section, we define sequential consistency formally. 

We first define the simpler notion of a trace r of S{n,m,v) being serial. 
For all 1 < u < \t\, let Iw(t,u) be the maximmn clement of the set {1 < 
A; < w I op{T{k)) = W A loc{T{k)) = Ioc{t{u))} if the set is nonempty and 0 
otherwise. In other words, the value of Iw{t, u) is the latest write event in r to 
location Ioc{t(u)) occurring no later than u. If no such write event exists, then 
Iw{t,u) is 0. In particular, if w is a write event then Iw{t,u) = u. The trace r 
is serial if for all locations 1 < i < m and u £ L{T,i), 

data{T{u)) = init{T){i), if Iw{t,u) = 0 

data{T{u)) = data{T{lw{T,u))), if Iw{t,u) ^ 0. 

Thus, a sequence is serial if every read to a location i returns the value of 
the latest write to i if one exists. Moreover, all reads to location i without a 
preceding write to i must return the initial value of i. 

The sequential consistency memory model M is a function that maps every 
sequence of memory events r € E{n,m,v)* and processor 1 < i < n to a 
total order M{T,i) on P{T,i) defined as follows: for all u,v £ P{T,i), we have 
{u,v) G M{T,i) iS u < V. A sequence r is sequentially consistent if there is a 
permutation / on N|^| such that the following conditions are satisfied. 

CI For a.ni<u,v< |r| and 1 < i < n, if (u, v) e M(r, i) then f{u) < f{v). 

C2 The sequence r' = tj-i(i)T^-i(2) . . . t/-i(|t|) is serial. 

Intuitively, the sequence t' is a permutation of the sequence r such that the 
event at index u in t is moved to index f(u) in r'. According to CI, this per- 
mutation must respect the total order M{T,i) for all 1 < z < n. According 
to C2, the permuted sequence must be serial. A run a € S{n, m, v) is sequen- 
tially consistent if a satisfies M. The memory system S{n,m,v) is sequentially 
consistent iff every run of S{n, m, v) is sequentially consistent. 
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The memory system in Figure 1 is sequentially consistent. Here is an example 
of a sequentially consistent run a of that memory system, the corresponding 
trace r of cr, and the sequence r' obtained by permuting r. 

{ACKX, 1,1) 
{UPD,l) 
{W,l,l,l) 
{R,2,l,0) 
{UPD,2) 
{ACKS, 2,1) 
{UPD,2) 
{R,2,l,l) 

Sequential consistency orders the event t(2) before the event r(3) at processor 
2. Let / be the permutation on N3 defined by /(I) = 2, /(2) = 1, and /(3) = 3. 
The sequence r' is the permutation of r under /. It is easy to check that both 
conditions CI and C2 mentioned above are satisfied. 

In order to prove that a run of a memory system is sequentially consistent, 
one needs to provide a reordering of the memory events of the run. This reorder- 
ing should be serial and should respect the total orders imposed by sequential 
consistency at each processor. Since the memory systems we consider in this 
paper are data independent, we only need to show sequential consistency for 
the unambiguous runs of the memory system. This reduction is stated formally 
in the following theorem. 

Theorem 4.1 For all n,m> 1, the following statements are equivalent. 

1. For all V > 1, every trace of S{n,m,v) is sequentially consistent. 

2. For all t; > 1, every unambiguous trace of S{n,m,v) is sequentially con- 
sistent. 

Proof: The (1) ^ (2) case is trivial. 

((2) <J= (1)) Let r be a trace of S{n, m, v) for some f > 1. From Assumption 2 
there is w' > 1, an unambiguous trace r' of S{n, m, v') and a renaming fimction 
A : X N„/ ^ N„ such that t = A''(t') and init{T){j) = A(j, init{T'){j)) for 
3l\ 1 < j < m. Since r' is sequentially consistent, we know that conditions CI 
and C2 are satisfied by t'. It is not difficult to see that both conditions CI and 
C2 are satisfied by A''(r') as well. Therefore r is sequentially consistent. ■ 

5 Witness 

Theorem 4.1 states that in order to prove sequential consistency for all runs in 
a memory system with n processors and m locations (and any number of data 
values) , it suffices to prove sequential consistency for all unambiguous runs in the 
system. In this section, we further reduce the problem of checking sequential 
consistency on an unambiguous run to the problem of detecting a cycle in a 
constraint graph. 



(i?,2,l,0) 

r = a= (i?,2,l,0) t'={W,1,1,1) 
{R,2,l,l) {R,2,l,l) 
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Consider a memory system S'(n, m, v) for some fixed n, m, v > I. A witness 
for S{n, m, v) maps every trace t of m, and location 1 < i < m to 
a total order f2(r, i) on the set of writes L^{T,i) to location i. Then the total 
order ri(T, i) on the write events to location i can be extended to a partial 
order Q,'^{T,i) on all memory events (including read events) to location i. If a 
read event r reads the value written by the write event w, the partial order 
puts r after w and all write events preceding w, and before all write events 
succeeding w. Formally, for every location 1 < « < m, and x,y € L{T,i), we 
have that {x, y) G ^^{t, i) iff one of the following conditions holds. 

1. data{T{x)) = data{T{y)), op{t{x)) = W, and op{T{y)) = R. 

2. data{T{x)) = init{T){i) and data{T{y)) ^ init{T)(i). 

3. 3a, 6 e L''^{T,i) such that (a, 6) G r2(T, z), data{T{a)) = data{T{x)), and 
data{T{b)) = data{T{y)). 

We now show that the relation f2^(r, i) is a partial order. First, we need the 
following lemma about Cl^{T,i). 

Lemma 5.1 For all unambiguous traces t of S{n,m,v), locations \ < i < m 
and r,s,t € L{T,i), if {r,s) G Q^{T,i), then either {r,t) G Q^{T,i) or {t,s) G 
Q^ir, i). 

Proof: Since (r, s) <E n'^{T,i), either data{T{s)) ^ init{T){i) or there is a x e 
U"(T,i) such that data{T{s)) = data{T{x)). Since r is an unambiguous trace, 
we have that data{T{x)) ^ init{T){i). Therefore, we get that data{T{s)) ^ 
init{T){i) in both cases. If data{T{t)) = init{T){i) we immediately get that 
{t,s) e ri'^(r, i). So suppose data{T{t)) ^ init{T){i). Since r is unambiguous, 
there is y G L^{T,i) such that data{T{t)) = data{T{y)). We have three cases 
from the definition of (r, s) G f2*(r, i). 

1. data{T{r)) = data{T{s)), op{T{r)) = W, and and op{t{s)) = R. Since 
n is a total order on L^{T,i), either {r,y) G ^(t, i) or {y,r) G ^l{T,i). 
In the first case, we have {r,t) G fl'^(T, i). In the second case, we have 

{t,s) en-{T,i). 

2. data{T{r)) = init{T){i) and data{T{s)) ^ init{T){i). We get that (r, i) G 
f2^(r,i). 

3. 3a, 6 G L'^{T,i) such that (a, 6) G f2(r, i), data{T{a)) = data{T{r)), and 
data{T{b)) = data{T{s)). Since 17 is a total order on L™(t, i), either (a, y) G 
r2(r, or {y,a) G ri(r, i). In the first case, we have {r,t) G Q'^{T,i). In 
the second case, we have by transitivity {y, b) G CI{t, i) and therefore 
{t,s) GQ^{T,i). 



Lemma 5.2 For all unambiguous traces r of S{n, m, v) and locations 1 < i < 
m, we have that fl^{T,i) is a partial order. 
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Proof: Wc show that il'^{T,i) is irrcflcxivc. In other words, for all 1 < a; < 
|r|, we have that {x,x) ^ r2'^(T, i). This is an easy proof by contradiction by 
assuming {x, x) G 0®(t, i) and performing a case analysis over the three resulting 
conditions. 

We show that ^^{t, i) is anti-symmetric. In other words, for all 1 < x < y < 
|r|, if {x,y) e Q^(t, then {y,x) ^ f2^(T, z). We do a proof by contradiction. 
Suppose both {x,y) G ^'^(t, i) and (y, a;) G Q'^{T,i). Wc reason as in the proof 
of Lemma 5.1 to obtain data{T{x)) ^ init{T){i) and data{T{y)) ^ init{T){i). 
Therefore there are a,b G L^{T,i) such that data{T{a)) = data{T{x)) and 
data{T{b)) = data{T{y)). We perform the following case analysis. 

1. a = b. Either op(x) — R and op{y) = R, or op{x) = W and op(y) — R, 
or op{x) = R and op{y) = W. In the first case (x.y) ^ Q^{T,i) and 
{y,x) ^ n''{T,i). In the second case {y,x) ^ ri''(r, z). In the third case 
{x,y) ^f2'=(r,i). 

2. {a,b) G il{T,i). Wc have data{T{x)) ^ data{T{y)) since r is unambiguous. 
Since ri(r, i) is a total order, we have (6, a) ^ n(r, i). Therefore {y,x) 0 

3. (6, a) G ri(T, i). This case is symmetric to Case 2. 

Finally, we show that ^'^{T,i) is transitive. Suppose {x,y) G r2'^(T, i) and 
(y, ^) G n^(r, i). Prom Lemma 5.1, either {x,z) G f2^(T, i) or {z,y) G ^^(t, z). 
We have shown f2®(T, i) to be anti-symmetric. Therefore {x,z) G ^^(t, i). ■ 

5.1 Constraint graph 

Suppose T is an unambiguous trace of S{n, m, v). We have that M{t, i) is a total 
order on P(t, i) for all \ < i < n from the definition of sequential consistency. 
For any witness O, we also have that Uf^{T^i) is a partial order on L(T,j) for 
all 1 < j < m from Lemma 5.2. The union of the n total orders M(T,i) and 
m partial orders fl^{T,j) imposes a graph on dom{T). The acyclicity of this 
graph, for some witness fi, is a necessary and sufficient condition for the trace r 
to satisfy sequential consistency. Wc define a function G that for every witness 
f2 returns a function G(0). The func;tion G(0) maps every unambiguous trace 
r of S{n, m, v) to the graph {dom{T), Ui<i<n ^i^- ^ Ui<j<m ^''(''"! .?)) • The 
work of Gibbons and Korach [GK97] defines a constraint graph on the memory 
events of a run that is similar to G{Q){t). 

Theorem 5.3 For all n,m,v > 1, every unambiguous trace of S{n,m,v) is 
sequentially consistent iff there is a witness such that the graph G(f2)(r) is 
acyclic for every unambiguous trace t of S{n,m,v). 

Proof: (=^>) Suppose r is an unambiguous trace of S{n, m, v). Then t satisfies 
sequential consistency. There is a permutation / on N|^| such that conditions 
CI and C2 are satisfied. For all 1 < z < to, define ri(r, i) to be the total 
order on L^{T,i) such that for all x,y G L'^{T,i), we have {x,y) G Q{T,i) iS 
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f{x) < f{y). Wc show that the permutation / is a hnearization of the vertices 
in G{^){t) that preserves all the edges. In other words, if {x,y) G M(t, for 
some 1 < i < n or (a;, y) e n^(r, j) for some < j < m, then f{x) < f{y). If 
(x, y) e M(r, i) then we have from CI that f{x) < f{y). We show below that 
if {x,y) G n%T,j) then /(a;) < /(y). 

Let r' = Ty:-i(i)Tj-i(2) • ■ • ry-i(|,-|). For all 1 < w < |t| we have that r(u) = 
T'(/(?i)). Since t is unambiguous t' is also unambiguous. Suppose a G L'^(T,j) 
and a; G L{T,j). We show that if data{T{a)) = data{T{x)) then /(a) < f{x). 
We have that /(a) G L'^ir'J), f{x) G i(r',i), (iata(T(a)) = rfata(r'(/(a))), 
and data{T{x)) = datair' {f {x))) . Since r' is unambiguous, cither a; = a or 
op{T'{f{x))) = R. In the first case /(a) = f{x), and in the second case /(a) = 
Iw{t' , f{x)) which implies that /(a) < f{x). Therefore /(o) < f{x). 

If (a;,y) G n^(T,j) then we have three cases. In each case, we show that 
fix) < f{y). 

1. data{T{x)) = data{T{y)), op{t{x)) = W, and op{T{y)) = R. We have 
datair' {fix))) = datair' {fiy))), opir'if{x))) = W, and opir'if{y))) = 
R. Since t' is unambiguous, wc get that datair' {f {y))) ^ initir')ij). 
Therefore f{x) = lw{r' , fiy)) which implies that fix) < fiy). 

2. datair {x)) = init{T){j) and datair (y)) ^ initir){j). Since x y we 
have fix) ^ fiy)- We show fix) < fiy) by contradiction. Suppose 
fiy) < fix)- Since data{r{y)) ^ initir)ij) there is 6 G L"'(t, j) such that 
datairib)) = data{riy)). Therefore we have that fib) < fiy) < fix). 
Therefore /(&) < to(T',/(x)). Since the trace r' is unambiguous and 
datair' if {x))) = initir)ij) we have a contradiction. 

3. 3a, 6 G L'^(t,j) such that (a, 6) G V.{T,j), data{ria)) = datairix)), and 
data{r{b)) — data{r{y)). We show f{x) < f{y) by contradiction. Sup- 
pose f{y) < fix). We have that /(a) < f{x) and f{b) < f{y). Since 
(a, 6) G il(T, j), we have /(a) < fib) from the definition of il. Thus we 
have /(a) < /(6) < fiy) < fix) Therefore f{a) ^ Iwir'Jix)). Since r' 
is unambiguous and datair' {f ia))) = datair' if {x))) we have a contradic- 
tion. 

{<=) Suppose there is a witness il such that G'(ri)(r) is acyclic for all unam- 
biguous traces r of Sin, m, v). Let / be a linearization of the vertices in G{il){T) 
that respects all edges. In other words, if {x, y) G M{r, i) for some 1 < i < n or 
{x,y) G ri^(T,j) for some 1 < j < m, then f{x) < fiy). Then CI is satisfied. 
Let r' denote Tj^-i(i)ry-i(2) . . . Tf-'^(\T\)- We now show that r' is serial. 

We have that r'{x) = T{f-'^{x)) for all 1 < x < \r'\. Let Ioc{t'{x)) = j for 
some 1 < X < |t'|. We show that if Iwir' ,x) = 0 then datair' ix)) = initir)ij), 
and if lwir',x) ^ 0 then datair' {x)) = datair' ilwir' , x))). Thus, whenever 
Iwir'jX) = lwir',y) we have datair' ix) = datair' iy)). Here are the two cases. 

1. lwir',x) — 0. We have that op{r'{x)) = R, otherwise lwir',x) = x ^ 0 
which is a contradiction. We prove by contradiction that datair' {x)) = 
initir)ij). Suppose datair'ix)) ^ initir)ij). Then there is a G L^(T,j) 
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such that data{T' (x)) = data{T{f~^{x))) = data{T{a)). Therefore we get 
that {a,f~^{x)) G r2^(r,j) which means that /(a) < x. This impUes that 
/(a) < Iw{t',x) which is a contradiction. 

2. Iw{t',x) ^ 0. We show data{T'{x)) = data{T' {Iw{t' ,x))). If op{t'{x)) = 
W, then Iw{t' ,x) = x and data{T'(x)) = dat,a{T' {Iw{t' .x))). Suppose 
op{t'{x)) = R. Then there is a G L'"{T,j) such that data{T'{x)) = 
data{T{f-^{x))) = data{T{a)). Therefore {a,f-\x)) G f^^(r,j) which 
means that /(a) < x. Therefore /(a) < Iw(t',x). Suppose /(a) < 
Iw{t',x). Then there is 6 G L^{T,j) such that /(a) < f{b) < x. Since 
/(a) < fib), we have (a, 6) G f^'=(r,j). Therefore {f-'^{x),b) G 0^(r,j). 
This means that x < f{b) which is a contradiction. Therefore /(a) = 
Iw{t', x). 



5.2 Simple witness 

Theorems 4.1 and 5.3 suggest that the memory system S{n, m, v) can be proved 
sequentially consistent as follows. We produce for each v' > 1 a, witness for 
S{n, m, v') and show for every unambiguous trace r of S{n, m, v') that the graph 
G(Q)(r') is acyclic. But the construction of the witness is still left to the user. 
In this section, we argue that a simple witness, which orders the write events 
to a location exactly in the order in which they occur, suffices for a number of 
memory systems occurring in practice. Formally, a witness Vt is simple if for all 
traces t of S{n, m, v) and locations 1 < i < m, we have {x, y) G ^(r, i) iff a; < y 
for all x, y G L'"(t, i). 

Consider the memory system of Figure 1. We argue informally that the 
simple witness is a good witness for this memory system. Permission to perform 
writes flows from one cache to another by means of the ACKX message. Note 
that for each location j, the variable owner [j] is set to 0 (which is not the id 
of any processor) when an ACKX message is generated. When the ACKX 
message is received at the destination (by the UPD event), the destination 
moves to EXC state and sets owner [j] to the destination id. A new ACKX 
message is generated only when owner[j] ^ 0. Thus, the memory system has 
the property that each memory location can be held in EXC state by at most 
one cache. Moreover, writes to the location j can happen only when the cache 
has the location in EXC state. Therefore, at most one cache can be performing 
writes to a memory location. This indicates that the logical order of the write 
events is the same as their temporal order. In other words, the simple witness 
is the correct witness for demonstrating that a run is sequentially consistent. 

In general, for any memory system in which at any time at most one proces- 
sor can perform write events to a location, the simple witness is very likely to 
be the correct witness. Most memory systems occurring in practice [LLG"'"90, 
KOH+94, BDH+99, BGM+00] have this property In Section 8, we describe 
a model checking algorithm to verify the correctness of a memory system with 
respect to the simple witness. If the simple witness is indeed the desired witness 
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and the memory system is designed correctly, then our algorithm will be able to 
verify its correctness. Otherwise, it will produce an error trace suggesting to the 
user that either there is an error in the memory system or the simple witness is 
not a correct witness. Thus our method for checking sequential consistency is 
clearly sound. We have argued that it is also complete on most shared-memory 
systems that occur in practice. 

6 Nice cycle reduction 

For some n,m,v > 1, let S{n,m,v) be a memory system and fl a witness for 
it. Let T be an unambiguous trace of S{n,m,v). In this section, we begin 
our quest for a method to detect cycles in G{Q){t). We show that it suffices 
to detect a special class of cycles called nice cycles. In Section 7, we further 
reduce our search for cycles to the class of canonical nice cycles. In Section 8, 
we will show that detection of canonical nice cycles can be performed by model 
checking. 

We fix some fc > 1 and use the symbol ® to denote addition over the additive 
group with elements Nk and identity element k. A k-nice cycle in G{fl){T) is 
a sequence ui,vi, . . . ,Uk,Vk of distinct vertices in N|^| such that the following 
conditions are true. 

1. For all 1 < a; < k, we have {ux,Vx) G M(r, i) for some 1 < i < n and 

{Va:,Ua:®l) S r2^(T,j) for SOme I < j < 171. 

2. For all 1 < a; < y < A: and for all 1 < i, j < n, if {ux,Vx) € M(t, i) and 

{uy,Vy) e M{T,j) then i ^ j. 

3. For all 1 < a; < y < A: and for all I < i,j < m, if {vx, Ux^i) G ^^^(t, i) and 
(%, Wj/ei) e fl^irj) then i ^ j. 

In a fc-nice cycle, no two edges belong to the relation M(t, i) for any processor 
i. Similarly, no two edges belong to the relation fl^{T, j) for any location j. The 
above definition also implies that if a cycle is fc-nice then k < min({n, m}). 

Theorem 6.1 // the graph G'(ri)(r) has a cycle, then it has a k-nice cycle for 

som,e k such that I <k < min({n,m.}). 

Proof: Suppose G{^1){t) has no k-nice cycles but does have a cycle. Consider 
the shortest such cycle ui, . . . ,ui where I > I. For this proof, we denote by 
© addition over the additive group with elements N/ and identity element I. 
Then for all 1 < ,t < Z either {ux,Ux(si) G M(t, i) for some 1 < i < n or 
{ux, UxQi) G ^^'^(t, i) for some I < i < m. 

Since the cycle ui, . . . ,ui is not k-nice for any fc, there are 1 < a < 6 < ^ such 
that either (1) {ua,Ua($i) € M(r, and {ub,Ub($i) € M(t, z) for some I < i < n, 
or (2) (wo,Ua0i) G r2^(T, z) and (u6,M6©i) S il^{T,i) for some 1 < i < m. 

Case (1). We have from the definition of M that Ua < Wo©! and Ub < Ub^i- 
Either Ua < Ub or Ub < Ua- If Ua < Ub then Ua < Ub©! or (ua,U6©i) e M(t, i). 
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If U(, < Ua then U(, < Ua©i or {ub,Ua(Bi) G M{T,i). In both cases, we have a 
contradiction since the cycle can be made shorter. 

Case (2). Prom Lemma 5.1, either e Cl^{T,i) or {ub,UaQi) € Cl^{T,i). 

In both cases, we have a contradiction since the cycle can be made shorter. ■ 

7 Symmetry reduction 

Suppose S'(n,m,f) is a memory system for some n,m,v > 1. In this section, 
we use symmetry arguments to further reduce the class of cycles that need 
to be detected in constraint graphs. Each fc-nice cycle has 2 x fc edges with 
one edge each for k different processors and k different locations. These edges 
can potentially occur in any order yielding a set of isomorphic cycles. But 
if the memory system S{n, m, v) is symmetric with respect to processor and 
memory location ids, presence of any one of the isomorphic nice cycles implies 
the existence of a nice cycle in which the edges are arranged in a canonical order. 
Thus, it suffices to search for a cycle with edges in a canonical order. 

We discuss processor symmetry in Section 7.1 and location symmetry in 
Section 7.2. We combine processor and location symmetry to demonstrate the 
reduction from nice cycles to canonical nice cycles in Section 7.3. 

7.1 Processor symmetry 

For any permutation A on N„, the function on E(n, m, v) permutes the pro- 
cessor ids of events according to A. Formally, for all e = (a, b, c, d) G E{n, m, v), 
we define A^'(e) = (a, A(6), c, d). The function A*" is extended to sequences in 
E{n, m, v)* in the natural way. 

Assumption 3 (Processor symmetry) For every permutation A on N„ and 
for all traces r of the memory system S{n,m,v), we have that Xp{t) is a trace 
of S{n,m,v) and init{\P{T)) = init^r). 

We argue informally that the memory system in Figure 1 satisfies Assump- 
tion 3. The operations performed by the various parameterized actions on the 
state variables that store processor ids are symmetric. Suppose s is a state of 
the system. We denote by A^(s) the state obtained by permuting the values 
of variables that store processors ids according to A. Then, for example, if the 
action UPD{i) in some state s yields state t, then the action UPD{X{i)) in state 
AP(s) yields the state X^{t). Thus, from any run a we can construct another run 
A^(a-). If a shared-memory system is described with symmetric types, such as 
scalaxsets [ID96], used to model variables containing processor ids, then it has 
the property of processor symmetry by construction. 

The following lemma states that the sequential consistency memory model 
is symmetric with respect to processor ids. It states that two events in a trace 
T ordered by sequential consistency remain ordered under any permutation of 
processor ids. 
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Lemma 7.1 Suppose X is a permutation on N„. Suppose r and t' are traces 
of S{n,m,v) such that t' = A^(t). Then for all 1 < x,y < \t\, and for all 
1 < i < n, we have that {x,y) G M(t, i) iff {x,y) G M{t' , X{i)). 

Proof: For all 1 < x, y < |r| and for all 1 < i < n, we have that 

{x,y) G M{T,i) 
<^ proc{T{x)) = proc{T{y)) = i and x < y 

proc{T'{x)) = proc{T'{y)) = X{i) and x <y 
^ {x,y) &M{T',X{i)). 

■ 

The following lemma states that the partial order Qf^ obtained from a simple 

witness 17 is symmetric with respect to processor ids. It states that two events to 
location i ordered by r2*(T, i) in a trace t remain ordered under any permutation 
of processor ids. 

Lemma 7.2 Suppose Q is a simple witness for the memory system S{n, m, v) 
and X is a permutation on N„. Suppose r and t' are unambiguous traces of 
S{n, m, v) such that t' = Ap(t). Then for all 1 < x,y < |r| and for all 1 < i < 
m, we have that {x,y) G ri*^(T, i) iff {x,y) G n'^{T',i). 

Proof: We have {x, y) G ri(r, z) iff a; < y iff {x, y) G ri(r', i). From the definition 
of f2®(T, i) we have the following three cases. 

1. data{T{x)) = data{T{y)), op{t{x)) = W, op{T{y)) = R iS data{T'{x)) = 
data{T'{y)), op{t'{x)) = W, op{T'{y)) = R. 

2. data{T{x)) = init{T){i) and data{T{y)) ^ init{T){i) iff data{T'{x)) = 
init{T'){i) and data{T'{y)) ^ init{T'){i). 

3. 3a, 6 G L'^{T,i) such that a < b, data(T(a)) = data{T{x)), data{T{b)) = 
data{T{y)) iff 3a, 6 G L^{t' ,i) such that a < b, data{T'{a)) — data{T'{x)), 
data{T'{b)) = data{T'{y)). 



7.2 Location symmetry 

For any permutation A on Nm, the function A' on E{n, m, v) permutes the loca- 
tion ids of events according to A. Formally, for all e = {a,b,c,d) G E{n,m,v), 
we define A'(e) = {a,b, X{c),d). The function A' is extended to sequences in 
E{n, m, v)* in the natural way. 

Assumption 4 (Location symmetry) For every permutation A on Nm and 
for all traces t of the memory system S{n,m,v), we have that A'(r) is a trace 
of S{n,m,v) and init{X\T)) o A = initij). 
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Wc can argue informally that the memory system in Figure 1 satisfies As- 
sumption 4 also. The operations performed by the various parameterized actions 
on the state variables that store location ids are symmetric. Suppose s is a state 
of the system. We denote by A'(.s) the state obtained by permuting the values 
of variables that store location ids according to A. Then, for example, if the 
action UPD{i) in some state s yields state t, then the action UPD{X{i)) in state 
a' (s) yields the state A' (t) . If scalarscts are used for modeling variables contain- 
ing location ids, the shared-memory system will have the property of location 
symmetry by construction. 

The following lemma states that the sequential consistency memory model 
is symmetric with respect to location ids. It states that two events in a trace 
T ordered by sequential consistency remain ordered under any permutation of 
location ids. 

Lemma 7.3 Suppose A is a permut,ation on Nm- Suppose t and r' are traces 
of S{n,m,v) such that t' = A'(t). Then for all 1 < x,y < \t\, and for all 
1 < i <n, we have that {x, y) G M(t, i) iff {x, y) e M(t', i). 

Proof: For all 1 < x, j/ < |r| and for all 1 < i < m, we have that 

{x,y) G M{T,i) 

proc{T{x)) = proc{T{y)) = i and x < y 
proc{T'{x)) = proc{T'{y)) = i and x < y 
^ {x,y) &M{t',i). 

■ 

The following lemma states that the partial order Q.^ obtained from a simple 

witness n is symmetric with respect to location ids. It states that two events to 
location i ordered by ^'^{t, i) in a trace r remain ordered under any permutation 
of location ids. 

Lemma 7.4 Suppose fl is a simple witness for the memory system S{n, m, v) 

and X is a permutation on Nm- Suppose t and t' are unambiguous traces of 
S{n,m,v) such that t' = A'(t). Then for all 1 < x,y < |r| and for all 1 < i < 
m, we have that {x,y) G fl^{T,i) iff {x,y) G fl^{T',X{i)). 

Proof: We have {x,y) e fl{T,i) iff a; < y iff {x,y) e fl{T',X{i)). From the 
definition of 0^(t, i) we have the following three cases. 

1. data{T{x)) = data{T{y)), op{t{x)) = W, op{T{y)) = iff data{T'{x)) = 

datair'iy)), op{t'{x)) = W, op{T'{y)) = R. 

2. data{T{x)) = init{T){i) and data{T{y)) ^ init{T){i) iff data{T'{x)) = 
init{T'){X{i)) and data{T'{y)) ^ init{T'){X{i)). 

3. 3a, 5 e L^{T,i) where a < b, data{T{a)) = data{T{x)), and data{T{b)) = 
data{T{y)) iff 3a, b e L^{t' , X{i)) where a < b, data{T'{a)) = data{T'{x)), 
and data{T'{b)) = data{T'{y)). 
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7.3 Combining processor and location symmetry 

We fix some A; > 1 and use the symbol 0 to denote addition over the additive 
group with elements Nfc and identity element k. A fc-nice cycle ui,vi, . . . ,Uk,Vk 
is canonical if {ux,Vx) S M{t,x) and {vx,Uxqi) S CI^{t,x(B 1) for all 1 < x < 
k. In other words, the processor edges in a canonical nice cycle are arranged 
in increasing order of processor ids. Similarly, the location edges are arranged 
in increasing order of location ids. The following theorem claims that if the 
constraint graph of a run has a nice cycle then there is some run with a canonical 
nice cycle as well. 

Theorem 7.5 Suppose CI is a simple witness for the memory system S{n, m, v). 
Let T be an unambiguous trace of S{n,m,v). If the graph G(r2)(r) has a k-nice 
cycle, then there is an unambiguous trace t" of S{n,m,v) such that G{fl){T") 
has a canonical k-nice cycle. 

Proof: Let Ui,Vi, . . . , Uk, Vk be a fc-nice cycle in G{^){t). Let 1 < ii, . . . , ifc < n 
and 1 < ji,---,jk < TO be such that {ux,Vx) € M{T,ix) and (wx,Wxei) € 
il'^{T,jx(si) for all 1 < X < fc. Let a be a permutation on N„ that maps ix to 
X for all 1 < X < k. Then from Assumption 3 there is a trace r' of S{n,m,v) 
such that t' = a^(r). Let /? be a permutation on that maps jx to x for all 
1 < a; < A;. Then from Assumption 4 there is a trace t" of S{n, m, v) such that 
t" = I3\t'). For all 1 < a; < fc, we have that 

{ux-,Vx) e M{T,ix) 
^ {ux,Vx) € M{t' ,a{ix)) = M{t',x) from Lemma 7.1 
<^ {ux,Vx) & M{t" ,x) from Lemma 7.3. 

For all 1 < x < fc, we also have that 

{vx,ux®i) e n''{T,jxm) 

<^ {vx,Uxm) ^ ^""{T'Jxm) from Lemma 7.2 

<^ {vx,Ux(Bi) e n^ir", 0{jxei)) = CI''{t",x 0 1) from Lemma 7.4. 

Therefore ui, wi, . . . , Ufc, ffe is a canonical A;-nice cycle in G{fl){T"). m 
Finally, Theorems 4.1, 5.3, 6.1 and 7.5 can be easily combined to yield the 
following theorem. 

Corollary 7.6 Let n,m> 1. Suppose for all v > 1, for all unambiguous traces 
T of S{n,m,v), and for all 1 < k < min{{n,m}) , the graph G{fl){T) for the 
simple witness fl does not have a canonical k-nice cycle. Then for all v > 1, 
every trace of S{n, m, v) is sequentially consistent. 

8 Model checking memory systems 

In this section, we present a model checking algorithm that, given a k such 
that 1 < A; < min{{n,m}), determines whether there is > 1 and a trace r in 
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Automaton Constraink{j) for 1 < j < fc 

States {a, 6} 

Initial state a 

Accepting states {b} 

Alphabet i?(n,m, 3) 

Transitions 

0 -'(op(e) = W A loc{e) = j) 
^s' = s 

[] s = a A op{e) = W A loc{e) = j A data{e) = 1 

— > s' = a 

[] s = aA op{e) = W A loc{e) = j A data{e) = 2 
-^s' = b 

W s = bA op{e) = W A loc{e) = j A data{e) = 3 
^s' = b 



Automaton Constraink{j) for k < j < m 

States {a} 

Initial state a 

Accepting states {a} 

Alphabet E{n,m,3) 

Transitions 

[] -'(op(e) = W A loc{e) = j) V data{e) = 1 



Automaton Checkk{i) for 1 < i < fc 

States {a, b, err} 

Initial state a 

Accepting states {err} 

Alphabet E{n,m,2>) 

Transitions 

[] s = aA proc{e) = i A loc{e) = i A data{e) e {2, 3} 

^s' ^b 

[] s = bA proc{e) = i A loc{e) = i © 1 A {data{e) = 1 V (op(e) = W A data{e) = 2)) 

s' = err 
[] otherwise 
-^s' = s 



Figure 2: Automata for detecting canonical A;- nice cycle 
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S{n, m, v) such that the graph G'(0)(r) for the simple witness has a canonical 
/c-nice cycle. Corollary 7.6 then allows us to verify sequential consistency on the 
memory system S{n, m, v) for all v > 1 by mm({n, m}) such model checking 
lemmas. 

We fix some k such that I < k < min{{n,m}). We use the symbol ® to 
denote addition over the additive group with elements Nfe and identity element k. 
The model checking algorithm makes use of m automata named Constraink{j) 
for 1 < j < m, and k automata named Checkk{i) for 1 < i < k. The automata 
are shown in Figure 2. Each automaton refers to a variable s that represents the 
state of the automaton. Model checking is performed on the system obtained 
by composing these automata with the memory system S{n, m, 3). 

We now define the regular languages accepted by these automata formally. 
In order to be concise, we use tuples of sets to denote the set obtained by 
taking the cross-product of the component sets. For example, the 4-tuple 
{{R, W^},{1},{1},{2,3}) denotes the set {R, W} x {1} x {1} x {2,3}. This set 
denotes a read event or a write event by processor 1 to location 1 with data value 
2 or 3. We further simplify notation and denote this set by {{R, W}, 1, 1, {2, 3}). 

For all memory locations 1 < j < m, the automaton Constraiukij) con- 
strains the write events to location j. If 1 < j < k, then Constraink{j) accepts 
sequences with a zero or more write events to location j with data value 1 fol- 
lowed by exactly one write event to location j with data value 2 followed by zero 
or more write events to location j with data value 3. Formally, the automaton 
Constraink{j) accepts a sequence t in E{n, m, 3)* iff the projection of t to the 
alphabet (iy,N„, N3) satisfies the regular expression 

(ir,N„,j,l)* • (H^,N„,j,2) • (W^,N„,j,3)*. 

If fc < j < m, then Constraink{j) accepts sequences where all writes to location 
j have data value 1. Formally, the automaton Constraink{j) accepts a sequence 
r in E{n,m,3)* iff the projection of r to the alphabet (VF,N„,j, N3) satisfies 
the regular expression 

(iy,N„,i,l)*. 

For all 1 < i < k, there is an automaton Check k{i)- The automaton 

Checkk{i) accepts a trace t if there are events x and y at processor i, with 
X occurring before y, such that x is an event to location i with data value 2 
or 3 and ?/ is an event to location i ® 1 with data value 1 or 2. Moreover, 
the event y is required to be a write event if its data value is 2. Formally, 

{E{n,m,2,)\{{R, W}, 1,1, {2,2,}))* ■ 

{{R, W},i,i,{2,^) ■ 
Checkk{i) = {E{n,m, 3)\ {{{R, W},i,i ® 1,1) U {W ® 1,2)))* ■ 

{{{R, W}, 1,1® 1,1) U{W,i,i® 1,2)) ■ 

E{n,m,3)* 

In order to check for canonical /c-nice cycles, we compose the memory system 
S{n,m,'6) with Constraink{j) for all 1 < j < m and with Checkk{i) for all 
1 <i < k. We use a model checker to determine if the resulting system has a 
trace in which the initial value of each memory location is 1. 
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Figure 3: Canonical fc-nice cycle 



Any accepting run of the composed system has 2 x fc events which can 
be arranged as shown in Figure 3 to yield a canonical fc-nice cycle. Each 
processor i for 1 < i < k and each location j for 1 < j < k supplies 2 
events. Each event is marked by a 4-tuple denoting the possible values for 
that event. The edge labeled by M(r, i) is due to the total order imposed 
by sequential consistency on the events at processor i. The edge labeled by 
Q,'^(T,j) is due to the partial order imposed by the simple witness on the events 
to location j. For example, consider the edge labeled ri'^(r, 2) with the source 
event labeled by {{R, W}, 1,2, 1) or (H^,l,2,2) and the sink event labeled by 
{{R, VF}, 2, 2, {2, 3}). In any run of the composed system, the write events to 
location 2 with value 1 occur before the write event with value 2 which occurs 
before the write events with value 3. Since Q is a simple witness, the partial 
order Q^{t,2) orders all events labeled with 1 before all events labeled with 2 
or 3. Hence any event denoted by {{R, W}, 1,2,1) is ordered before any event 
denoted by {{R, W}, 2, 2, {2, 3}). Moreover, the unique write event to location 2 
with data value 2 is ordered before any other events with value 2 or 3. Hence the 
event ( W, 1, 2, 2) is ordered before any event denoted by {{R, W}, 2, 2, {2, 3}). 

We have given an intuitive argument above that a canonical A;-nice cycle can 
be constructed from any run in the composed system. The following theorem 
proves that it is necessary and sufficient to check that the composed system has 
a run. 

Theorem 8.1 For all n,m>l, there is v > 1 and a canonical k-nice cycle in 
G{^){t) for the simple witness Q. and an unambiguous trace r of S{n,m,v) iff 
there is a trace r' of S{n,m,3) such that init{T'){j) = 1 for all 1 < j < m and 
t' e Constrainh{j) for all 1 < j <m and t' G Checkk{i) for all 1 < i < k. 
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Proof: (=>) Suppose there is a canonical fc-nice cycle ui,vi, . . . ,Uk,Vk in the 
graph G{Q){t). Then {ux,Vx) & M{t,x) and {vx,Ux®i) G Q^{t,x (B I) for 
all 1 < X < fc. Prom the definitfon of CI^{t,x), we have that data{T{ux)) ^ 
init{T){x) for all 1 < x < fc. Therefore, for all 1 < a; < A;, there is a unique write 
event Wx such that data{T{wx)) = data{T{ux)) ■ 

For all 1 < j < m, let Vj be the set of data values written by the write events 
to location j in t. and let fj : Vj ^\t\ be the function such that fj{v) is the 
index of the unique write event to location j with data value v. We define a 
renaming function A : x N„ — > N3 as follows. For all A; < j < m and a; € N„, 
we have A(j, x) = 1. For all 1 < j < A; and x € N^, we split the definition into 
two cases. For a; € T^-, we have 

X{j,x)= 1, iifj{x)<Wj 

2, if fj{x) = Wj 

3, if fj{x) > Wj. 

For X ^Vj,we have 

A(j,a;)= 1, if X = init{T){j) 
3, if a; ^ init{T){j). 

From Assumption 2, there is a trace r' of S{n, m. 3) such that r' = A'^(r) and 
init{T'){j) = A(j, init{T){j)) = 1 for all 1 < j < m. In t', for every location j 
such that 1 < j < k every write event before Wj (including the initial value of j) 
has the data value 1, the write event at Wj has the data value 2, and the write 
events after Wj have the data value 3. Moreover, for every location j such that 
k < j < m every write event has the data value 1. Therefore r' e Constrain k{i) 
for all 1 < z < fc. 

We show that r' G Checkk{i) for all 1 < i < A:. Since {ui,Vi) € M{T,i), 
we have that Uj < t;j for all 1 < i < A;. We already have that data{T'{ui)) = 
dataij' (wi)) = 2 ior dlW < i < k. Therefore all we need to show is that for all 
1 < i < A: we have data{T' {vi)) = 1 or op{T'{vi)) = W and data{T'{vi)) = 2. 
Since {vi, Ui©i) G ^^{t, i ® 1), one of the following conditions hold. 

1. data{T{vi)) = data{T{ui^i)), op{T{vi)) = W, and op(r(Mi0i)) = R. We 
have that op(r'(wi)) = op{T{vi)) = W. Since data{T{vi)) = data{T{ui^i)) 
we have data{T' (vi)) = data{T' {ui^i)) = 2. Thus, we get op{T'{vi)) = W 
and data{T' (vi)) = 2. 

2. data{T{vi)) = init{T){i © 1) and data{T{ui^i)) ^ init{T){i © 1). From the 
definition of A, we get that data{T'{vi)) = 1. 

3. 3a G L"'(t, z® 1) such that (a, Wi©i) G ri(r, z©l) and rfata(r(a)) = 
data{T{vi)). Since (a, 6) e ri(r, z0l) and 17 is a simple witness we get 
a <b. Therefore A(i © 1, data{T{a))) = 1. Thus X{i © 1, data{T{vi))) = 1 
and data{T'{vi)) = 1. 

Thus, in all cases we have that either data{T'{vi)) = 1 or op(r'(wi)) = W and 
dataij' {vi)) = 2. Therefore r' G Check k{i)- 
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(<;=) Suppose there is a trace t' of S'(n, m,3) such that init{T'){j) = 1 for 
all 1 < j < m and r' G Constraink{j) for all 1 < j < m and r' G Check k{i) 
for all 1 < i < fc. For all 1 < i < fc, let 1 < Uj < t;j < |r'| be such that 
the automaton Checkk{i) enters state b for the first time on observing t' {ui) 
and enters state err for the first time on observing T'{vi). Therefore we have 
proc{T'{ui)) = i, loc{T'{ui)) = i, and data{T'{ui)) G {2,3}. We also have 
proc^r' (vi)) — i, Ioc{t' [vj)) = ?' ® 1. and either daf,a{T' ivi)) = 1 or op{t' [vi)) = 
W and data{T' [vi)) = 2. From Assumption 2, there is a i; > 1, an unambiguous 
trace r of S{n,m,v), and a renaming function A : x ^ N3 such that 
r' = \'^{t) and X{i, init{T){i)) — init{T'){i) = 1 for all 1 < « < m. Therefore, 
we get that data{T{ui)) ^ init{T){i) for all 1 < i < /c. We will show that 
ui,vi, . . . ,Uk,Vk is a canonical fc-nice cycle in G{CI){t). Since proc{T{ui)) = 
proc(T(vi)) = i and Ui < Vi, we have {ui,Vi) G M{T.i) for all 1 < i < /c. 
We show that {vi,Ui^i) G n*(r, i©l) for all 1 < z < fc. First loc{T{vi)) = 
loc{T{uiQi)) = i ® 1. For all x,y G L^{t, i), if A(i, c?ata(T(a;)) < X{i, data{T{y)) 
then X < y from the property of Constraink{i) ■ There are two cases on r'(wi). 

1. data{T' (vi)) = 1. We have that X{i 0 1, data{T{vi))) = 1. There are 
a, 6 G L^{T,i 0 1) such that data{T{a)) = data{T{vi)) and data{T{b)) = 
data{T{uif^i)). Since data{T'{a)) = 1 and data{T'{b)) G {2,3}, we get 
from the definition of Constraink{i ® 1) that a < 6 or (a, 6) G f2(r, i ® 1). 
Therefore G ^'^(t, z0 1). 

2. op(r'(t;i)) = W and (ia<a(T'(i;i)) = 2. We have that op(T(i;i)) = W. 
There is an event b G L'^{T,i © 1) such that data{T{b)) = data{T{ui(^i)) . 
There are two subcases: data{T' (ui^i)) = 2 or data{T' (uiQi)) = 3. In the 
first subcase, we have Vi = b since Constrain k{i © 1) accepts traces with a 
single write event labeled with 2. Therefore data{T{vi)) = data{T{ui(^i)), 
op{T{vi)) = W and op(T(ui©i)) = R, and we get G fi''(T, z0 1). 
In the second subcase, since data{T'{a)) = 2 and data{T'{b)) — 3, we get 
from the definition of Constraink{i © 1) that a < 6 or (a, b) G ri(r, i © 1). 
Therefore (wi,u,0i) G f2^(T, i © 1). 

Therefore ui,vi, . . . ,Uk,Vk is a canonical fc-nice cycle in G(0)(t). ■ 
Example. We now give an example to illustrate the method described in 
this section. Although the memory system in Figure 1 is sequentially consistent, 
an earlier version had an error. The assignment owner [j] := 0 was missing in 
the guarded command of the action {ACKS,i,j). We modeled the system in 
TLA+ [Lam94] and model checked the system configuration with two processors 
and two locations using the model checker TLC [YML99] . The error manifests 
itself while checking for the existence of a canonical 2-nice cycle. First, the 
initial predicate of <S'(2, 2, 3) is strengthened by conjoining it with the following 
predicate: 

Vi G N„, j G : {cache[i\[j].d = 1). 

This strengthening ensures that only those traces are examined where the initial 
value of every location is 1. Second, automata Constrain 2(1), Constrain2{2), 
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Check2{l) and Check2{2) (from Figure 2) arc composed with S{2, 2, 3). Finally, 
the composed system is analyzed by the model checker TLC. The erroneous be- 
havior is when the system starts in the initial state with all cache lines in SHD 
state and owner[l] = owner[2] = 1, and then executes the following sequence 
of 12 events: 

1. {ACKX, 2,2) 

2. {UPD,2) 

3. {ACKS, 1,2) 

4. {ACKX, 2,2) 

5. {ACKX, 1,1) 

6. {UPD,1) 

7. {UPD,1) 

8. {W, 1,1,2) 

9. (ii, 1,2,1) 

10. {UPD,2) 

11. {W,2,2,2) 

12. {R,2,l,l) 

After event 2, owner[2] = 2, cache[l][2].s = INV, and cache[2][2].s = EXC. 
Now processor 1 gets a shared ack message {ACKS, 1,2) for location 2. Note 
that in the erroneous previous version of the example, this event does not set 
owner[2] to 0. Consequently owner[2] = 2 and cac/ie[2][2].s = SHD after 
event 3. An exclusive ack to processor 2 for location 2 is therefore allowed 
to happen at event 4. Since the shared ack message to processor 1 in event 3 is 
still sitting in mQ[l], cac/ie[l][2].s is still INV. Therefore event 4 does not gen- 
erate an INVAL message to processor 1 for location 2. At event 5, processor 1 
gets an exclusive ack message for location 1. This event also inserts an INVAL 
message on location 1 in m(5[2] behind the ACKX message on location 2. After 
the UPD events to processor 1 in events 6 and 7, we have cac/ie[l][l].s = EXC 
and cac/ie[l][2].s = SHD. Processor 1 writes 2 to location 1 and reads 1 from 
location 2 in the next two events, thereby sending automaton Check2{l) to the 
state err. Processor 2 now processes the ACKX message to location 2 in the 
UPD event 10. Note that processor 2 does not process the INVAL message to 
location 1 sitting in inQ[2]. At this point, we have cac/ie[2][l].s = SHD and 
cac/ie[2] [2].s = EXC. Processor 2 writes 2 to location 2 and reads 1 from lo- 
cation 1 in the next two events, thereby sending automaton Check2{2) to the 
state err. Since there has been only one write event of data value 2 to each 
location, the run is accepted by Constrain2{l) and Constrain2{2) also. ■ 
Note that while checking for canonical fc-nice cycles Constraink{j) has 2 
states for all 1 < j < and 1 state for k < j < m. Also Check k{i) has 3 states 
for all 1 < i < fc. Therefore, by composing Constraink{j) and Checkkii) with 
the memory system S{n, m, 2) we increase the state of the system by a factor of 
at most 2'^ x 3*^. Actually, for all locations k < j < m we are restricting write 
events to have only the data value 1. Therefore, in practice we might reduce 
the set of reachable states. 
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9 Related work 



Descriptions of shared-memory systems are parameterized by the number of 
processors, the number of memory locations, and the number of data values. 
The specification for such a system can be either an invariant or a shared- 
memory model. These specifications can be verified for some fixed values of the 
parameters or for arbitrary values of the parameters. The contribution of this 
paper is to provide a completely automatic method based on model checking to 
verify the sequential consistency memory model for fixed parameter values. We 
now describe the related work on verification of shared-memory systems along 
the two axes mentioned above. 

A number of papers have looked at invariant verification. Model checking 
has been used for fixed parameter values [MS91, CGH+93, EM95, ID96], while 
mechanical theorem proving [LD92, PD96] has been used for arbitrary param- 
eter values. Methods combining automatic abstraction with model checking 
[PD95, DelOO] have been used to verify snoopy cache-coherence protocols for 
arbitrary parameter values. McMillan [McMOl] has used a combination of the- 
orem proving and model checking to verify the directory-based FLASH cache- 
coherence protocol [KOH+94] for arbitrary parameter values. A limitation of 
all these approaches is that they do not explicate the formal connection between 
the verified invariants and shared-memory model for the protocol. 

There are some papers that have looked at verification of shared-memory 
models. Systematic manual proof methods [LLOR99, PSCH98] and theorem 
proving [AroOl] have been used to verify sequential consistency for arbitrary 
parameter values. These approaches require a significant amount of efi^ort on the 
part of the user. Our method is completely automatic and is a good debugging 
technique which can be apphed before using these methods. The approach of 
Henzinger et al. [HQR99] and Condon and Hu [CHOI] requires a manually 
constructed finite state machine called the serializer. The serializer generates 
the witness total order for each run of the protocol. By model c;hec;king the 
system composed of the protocol and the serializer, it can be easily checked that 
the witness total order for every run is a trace of serial memory. This idea is a 
particular instance of the more general "convenient computations" approach of 
Katz and Peled [KP92]. In general, the manual construction of the serializer can 
be tedious and infeasible in the case when unbounded storage is required. Our 
work is an improvement since the witness total order is deduced automatically 
from the simple write order. Moreover, the amount of state we add to the cache- 
coherence protocol in order to perform the model checking is significantly less 
than that added by the serializer approach. The "test model checking" approach 
of Nalumasu et al. [NGMG98] can check a variety of memory models and is 
automatic. Their tests are sound but incomplete for sequential consistency. On 
the other hand, our method offers sound and complete verification for a large 
class of cache-coherence protocols. 

Recently Glusman and Katz [GKOl] have shown that, in general, interpret- 
ing sequential consistency over finite traces is not equivalent to interpreting it 
over infinite traces. They have proposed conditions on shared-memory systems 
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under which the two are equivalent. Their work is orthogonal to ours and a com- 
bination of the two will allow verification of sequential consistency over infinite 
traces for finite parameter values. 

10 Conclusions 

We now put the results of this paper in perspective. Assumption 1 about causal- 
ity and Assumption 2 about data independence are critical to our result that 
reduces the problem of verifying sequential consistency to model checking. As- 
sumption 3 about processor symmetry and Assumption 4 about location sym- 
metry are used to reduce the number of model checking lemmas to min{{n, m}) 
rather than exponential in n and m. 

In this paper, the read and write events have been modeled as atomic events. 
In most real machines, each read or write event is broken into two separate events 
— a request from the processor to the cache, and a response from the cache to 
the processor. Any memory model including sequential consistency naturally 
specifies a partial order on the requests. If the memory system services processor 
requests in order then the order of requests is the same as the order of responses. 
In this case, the method described in this paper can be used by identifying the 
atomic read and write events with the responses. The case when the memory 
system services requests out of order is not handled by this paper. 

The model checking algorithm described in the paper is sound and complete 
with respect to a simple witness for the memory system. In some protocols, for 
example the lazy caching protocol [ABM93], the correct witness is not simple. 
But the basic method described in the paper where data values of writes are 
constrained by automata can still be used if ordering decisions about writes can 
be made before the written values are read. The lazy caching protocol has this 
property and extending the methods described in the paper to handle it is part 
of our future work. We would also like to extend our work to handle other 
memory models. 
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